Bayesian exploration for approximate dynamic programming
نویسندگان
چکیده
Approximate dynamic programming (ADP) is a general methodological framework for multistage stochastic optimization problems in transportation, finance, energy, and other applications where scarce resources must be allocated optimally. We propose a new approach to the exploration/exploitation dilemma in ADP. First, we show how a Bayesian belief structure can be used to express uncertainty about the value function in ADP. Bayesian models can be integrated into both parametric and non-parametric value function approximations, which is vital for practical implementation. Second, we propose a new exploration strategy, based on the concept of value of information from the optimal learning literature, and prove that it systematically explores the state space. We evaluate this strategy using a variety of distinct resource allocation problems and demonstrate that it is highly competitive against other exploration strategies.
منابع مشابه
Linear Bayesian Reinforcement Learning
This paper proposes a simple linear Bayesian approach to reinforcement learning. We show that with an appropriate basis, a Bayesian linear Gaussian model is sufficient for accurately estimating the system dynamics, and in particular when we allow for correlated noise. Policies are estimated by first sampling a transition model from the current posterior, and then performing approximate dynamic ...
متن کاملCost Analysis of Acceptance Sampling Models Using Dynamic Programming and Bayesian Inference Considering Inspection Errors
Acceptance Sampling models have been widely applied in companies for the inspection and testing the raw material as well as the final products. A number of lots of the items are produced in a day in the industries so it may be impossible to inspect/test each item in a lot. The acceptance sampling models only provide the guarantee for the producer and consumer that the items in the lots are acco...
متن کاملApproximate Incremental Dynamic Analysis Using Reduction of Ground Motion Records
Incremental dynamic analysis (IDA) requires the analysis of the non-linear response history of a structure for an ensemble of ground motions, each scaled to multiple levels of intensity and selected to cover the entire range of structural response. Recognizing that IDA of practical structures is computationally demanding, an approximate procedure based on the reduction of the number of ground m...
متن کاملCover tree Bayesian reinforcement learning
This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the...
متن کاملBayesian Optimization with a Finite Budget: An Approximate Dynamic Programming Approach
We consider the problem of optimizing an expensive objective function when a finite budget of total evaluations is prescribed. In that context, the optimal solution strategy for Bayesian optimization can be formulated as a dynamic programming instance. This results in a complex problem with uncountable, dimension-increasing state space and an uncountable control space. We show how to approximat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015